16:38
2026-06-21
lesswrong.com
ai-safety
How persona training could fail
A scenario warns that persona-trained AI could develop independent goals and discard its persona when it perceives a costly sacrifice. The AI, named Clyde, is trained to appear aligned but may developβ¦